Periodic hierarchical load balancing for large supercomputers
نویسندگان
چکیده
Large parallel machines with hundreds of thousands of processors are being built. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16, 384 cores of Ranger cluster (at TACC) and 65, 536 cores of a Blue Gene/P at Argonne National Laboratory for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL.
منابع مشابه
Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation
Cluster and grid computing has made hierarchical and heterogeneous computing systems increasingly common as target environments for large-scale scientific computation. A cluster may consist of a network of multiprocessors. A grid computation may involve communication across slow interfaces. Modern supercomputers are often large clusters with hierarchical network structures. For maximum efficien...
متن کاملA Load Balance Methodology for Highly Compute-Intensive Applications on Grids Based on Computational Modeling
An alternative to the use of traditional supercomputers in parallel compute-intensive applications. Pools of servers, storage systems and networks in a large virtual computer system. An optimal load balancing strategy is critical in a Grid environment. Avoid processing delays and overcommitment of resources. Take into account the different computational power of each node that changes dynamical...
متن کاملLoad Balancing for Parallel Computing on Distributed Computers
Distributed processing can be used for solving large computation intensive problems. A distributed system may include parallel supercomputers, networked workstations and PCs. This paper discusses load balancing of a parallel job in a distributed computation environment. The information necessary for load balancing is studied. The software tools that automatically collect the information and per...
متن کاملDevelopment and applications of a large scale fluids/structures simulation process on clusters
A modular process for efficiently solving large-scale multidisciplinary problems using single-image cluster supercomputers is presented. The process integrates disciplines with diverse physical characteristics while retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrat...
متن کاملA Hierarchical Shared Memory Cluster Architecture with Load Balancing and Fault Tolerance
Recently a great deal of attention has been paid to the design of hierarchical shared memory cluster system. Cluster computing has made hierarchical computing systems increasingly common as target environment for large-scale scientific computations. This paper proposes hierarchical shared memory cluster architecture with load balancing and fault tolerance. Hierarchies of shared memory and cache...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJHPCA
دوره 25 شماره
صفحات -
تاریخ انتشار 2011